Skip to content

chore: Add primary key constraints for TPC-H, TPC-DS#22646

Open
neilconway wants to merge 2 commits into
apache:mainfrom
neilconway:neilc/chore-tpc-constraints
Open

chore: Add primary key constraints for TPC-H, TPC-DS#22646
neilconway wants to merge 2 commits into
apache:mainfrom
neilconway:neilc/chore-tpc-constraints

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 30, 2026

Which issue does this PR close?

Rationale for this change

The TPC-DS and TPC-H specifications define primary keys, but we previously did not include those constraints when defining the TPC-DS and TPC-H schemas. Including the constraints enables the query optimizer to generate better plans (e.g., by leveraging FDs); it also makes the benchmark setup closer to a realistic TPC-DS/H benchmark run.

To enable this, we need to fix MemTable::load: MemTable could be constructed with_constraints, but those constraints were not attached to the table returned by MemTable::load.

There is some duplication here: we define two copies of the primary keys of both TPC-DS and TPC-H, because benchmarks and test-utils can't easily share code. This could be improved but I'll defer that for now.

What changes are included in this PR?

  • Fix MemTable::load to include constraints on the newly loaded table
  • Refactor MemTable::load to use collect_partitioned
  • Add unit tests for new MemTable::load behavior
  • Add TPC-DS and TPC-H primary keys to benchmarks
  • Add TPC-DS and TPC-H primary keys to test-utils
  • Add TPC-H primary keys to SLT schema definitions

Are these changes tested?

New tests added for MemTable::load constraint behavior. SLT fixtures updated for change in TPC schemas and plans.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate labels May 30, 2026
@neilconway neilconway marked this pull request as ready for review May 30, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add primary keys to TPC-H, TPC-DS schemas

2 participants